Communication-Efficient Distributed Stochastic Gradient Descent with Butterfly Mixing
نویسندگان
چکیده
Stochastic gradient descent is a widely used method to find locally-optimal models in machine learning and data mining. However, it is naturally a sequential algorithm, and parallelization involves severe compromises because the cost of synchronizing across a cluster is much larger than the time required to compute an optimal-sized gradient step. Here we explore butterfly mixing, where gradient steps are interleaved with the k stages of a butterfly network on 2k nodes. Udp based butterfly mix steps should be extremely fast and failure-tolerant, and convergence is almost as fast as a full mix (AllReduce) on every step.
منابع مشابه
An efficient distributed learning algorithm based on effective local functional approximations
Scalable machine learning over big data stored on a cluster of commodity machines with significant communication costs has become important in recent years. In this paper we give a novel approach to the distributed training of linear classifiers (involving smooth losses and L2 regularization) that is designed to reduce communication costs. At each iteration, the nodes minimize approximate objec...
متن کاملA Functional Approximation Based Distributed Learning Algorithm
Scalable machine learning over big data stored on a cluster of commodity machines with significant communication costs has become important in recent years. In this paper we give a novel approach to the distributed training of linear classifiers (involving smooth losses and L2 regularization) that is designed to reduce communication costs. At each iteration, the nodes minimize approximate objec...
متن کاملTrading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent
We present and study a distributed optimization algorithm by employing a stochastic dual coordinate ascent method. Stochastic dual coordinate ascent methods enjoy strong theoretical guarantees and often have better performances than stochastic gradient descent methods in optimizing regularized loss minimization problems. It still lacks of efforts in studying them in a distributed framework. We ...
متن کاملSplash: User-friendly Programming Interface for Parallelizing Stochastic Algorithms
Stochastic algorithms are efficient approaches to solving machine learning and optimization problems. In this paper, we propose a general framework called Splash for parallelizing stochastic algorithms on multi-node distributed systems. Splash consists of a programming interface and an execution engine. Using the programming interface, the user develops sequential stochastic algorithms without ...
متن کاملAsynchronous Peer-to-Peer Data Mining with Stochastic Gradient Descent
Fully distributed data mining algorithms build global models over large amounts of data distributed over a large number of peers in a network, without moving the data itself. In the area of peer-to-peer (P2P) networks, such algorithms have various applications in P2P social networking, and also in trackerless BitTorrent communities. The difficulty of the problem involves realizing good quality ...
متن کامل